{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Step 2: Adding Object Detections to a FiftyOne Dataset\n",
        "In our first step, we will be covering how you can add object detections to your dataset. First we will go through how to add predictions using the FiftyOne Model Zoo and apply_model. In the second part, we will demonstrate how to add your detection predictions from your own custom model or labels. Feel free to skip ahead if you are interested in only adding object detections with your own model or labels!\n",
        "\n",
        "\n",
        "## Using the Model Zoo\n",
        "Let's kick things off by loading in the [MSCOCO 2017](https://cocodataset.org/#home) validation split from the [FiftyOne Dataset Zoo](https://docs.voxel51.com/dataset_zoo/datasets.html). We will cap it to a max of 1000 samples:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import fiftyone as fo\n",
        "import fiftyone.zoo as foz\n",
        "\n",
        "dataset = foz.load_zoo_dataset(\"coco-2017\", split=\"validation\", max_samples=1000)\n",
        "\n",
        "session = fo.launch_app(dataset)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "vscode": {
          "languageId": "bat"
        }
      },
      "source": [
        "With FiftyOne, you have tons of pretrained models at your disposal to use via the [FiftyOne Model Zoo](https://docs.voxel51.com/model_zoo/index.html) or using one of our [integrations](https://docs.voxel51.com/integrations/index.html) such as [HuggingFace](https://docs.voxel51.com/integrations/huggingface.html)! To get started using them, first load the model in and pass it into the apply_model function. \n",
        "\n",
        "We will use [retinanet-resnet50-fpn-coco-torch](https://docs.voxel51.com/model_zoo/models.html#retinanet-resnet50-fpn-coco-torch) from the model zoo first!"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 11,
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            " 100% |███████████████| 1000/1000 [7.2m elapsed, 0s remaining, 2.3 samples/s]      \n"
          ]
        }
      ],
      "source": [
        "model = foz.load_zoo_model(\"retinanet-resnet50-fpn-coco-torch\")\n",
        "dataset.apply_model(model, label_field=\"zoo_predictions\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Let's visualize our results!"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "session.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "![zoo-predictions](https://cdn.voxel51.com/zoo-predictions.webp)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Adding Predictions using Ultralytics\n",
        "Thanks to [FiftyOne's integration](https://docs.voxel51.com/integrations/ultralytics.html) with [Ultralytics](https://github.com/ultralytics/ultralytics), we can pass any Ultralytics YOLO model into apply_model as well!"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "!pip install ultralytics"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 12,
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            " 100% |███████████████| 1000/1000 [46.5s elapsed, 0s remaining, 21.5 samples/s]      \n"
          ]
        }
      ],
      "source": [
        "from ultralytics import YOLO\n",
        "\n",
        "# YOLOv8\n",
        "model = YOLO(\"yolov8s.pt\")\n",
        "\n",
        "# model = YOLO(\"yolov8m.pt\")\n",
        "# model = YOLO(\"yolov8l.pt\")\n",
        "# model = YOLO(\"yolov8x.pt\")\n",
        "\n",
        "# YOLOv5\n",
        "# model = YOLO(\"yolov5s.pt\")\n",
        "# model = YOLO(\"yolov5m.pt\")\n",
        "# model = YOLO(\"yolov5l.pt\")\n",
        "# model = YOLO(\"yolov5x.pt\")\n",
        "\n",
        "# YOLOv9\n",
        "# model = YOLO(\"yolov9c.pt\")\n",
        "# model = YOLO(\"yolov9e.pt\")\n",
        "dataset.apply_model(model, label_field=\"YOLOv8\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "![yolo-predictions](https://cdn.voxel51.com/yolo-predictions.webp)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Adding Predictions from Custom Model\n",
        "When bringing your own model to add predictions to your dataset, you can add [detection labels](https://docs.voxel51.com/user_guide/using_datasets.html#object-detection) directly to each sample! The __most__ important part to remember is that FiftyOne uses `[nx, ny, nw, nh]` bounding box format, or normalized x,y,w,h notation. This means that each value in the bounding box is between (0,1). Below is a sample function that converts an `xyxy` box to `nxywh`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 13,
      "metadata": {},
      "outputs": [],
      "source": [
        "def convert_xyxy_boxes(sample, boxes):\n",
        "    new_boxes = []\n",
        "    \n",
        "    for box in boxes:\n",
        "        \n",
        "        # Normalize X and Y by width and height\n",
        "        nx = box[0] / sample.metadata.width\n",
        "        ny = box[1] / sample.metadata.height\n",
        "        \n",
        "        # Calculate width and height and normalize as well\n",
        "        nw = (box[2] - box[0]) / sample.metadata.width\n",
        "        nh = (box[3] - box[1]) / sample.metadata.height\n",
        "        new_box = [nx, ny, nw, nh]\n",
        "        new_boxes.append(new_box)\n",
        "        \n",
        "    return new_boxes"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "For our custom model in this example, we will be using torchvision [FasterRCNN_Resnet50](https://docs.voxel51.com/user_guide/using_datasets.html#object-detection). The pattern for adding custom labels looks like this:\n",
        "\n",
        "1. Load the sample image\n",
        "2. Perform any necessary preprocessing\n",
        "3. Inference on the image\n",
        "4. Grab the prediction and confidence of the model_output\n",
        "5. Adjust the bounding box if needed\n",
        "6. Add the values as a label to your sample\n",
        "\n",
        "Let's walkthrough them below!"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from torchvision.io.image import read_image\n",
        "from torchvision.models.detection import fasterrcnn_resnet50_fpn_v2, FasterRCNN_ResNet50_FPN_V2_Weights\n",
        "\n",
        "weights = FasterRCNN_ResNet50_FPN_V2_Weights.DEFAULT\n",
        "model = fasterrcnn_resnet50_fpn_v2(weights=weights, box_score_thresh=0.9)\n",
        "\n",
        "# Compute Metadata to collect each samples width and height\n",
        "dataset.compute_metadata() \n",
        "\n",
        "for sample in dataset:\n",
        "    # Step 1: Load the image\n",
        "    image = read_image(sample.filepath)\n",
        "\n",
        "    # Step 2: Preform preprocessing\n",
        "    preprocess = weights.transforms()\n",
        "\n",
        "    batch = [preprocess(image)]\n",
        "\n",
        "    # Step 3: Inference on the image\n",
        "    model.eval()\n",
        "    prediction = model(batch)[0]\n",
        "    \n",
        "    # Step 4: Grab the prediction and confidence\n",
        "    labels = [weights.meta[\"categories\"][i] for i in prediction[\"labels\"]]\n",
        "    confs = prediction[\"scores\"].tolist()\n",
        "\n",
        "    # Step 5: Convert the boxes to FiftyOne format\n",
        "    fo_boxes = convert_xyxy_boxes(sample, prediction[\"boxes\"].tolist())\n",
        "    detections = []\n",
        "    \n",
        "    # Step 6: Add to your sample\n",
        "    for cls, box, conf in zip(labels, fo_boxes, confs):\n",
        "        \n",
        "        det = fo.Detection(label=cls, bounding_box=box, confidence=conf)\n",
        "        detections.append(det)\n",
        "        \n",
        "    sample[\"torchvision\"] = fo.Detections(detections=detections)\n",
        "    sample.save()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Finally, we can see all of our results in the FiftyOne App!"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "session.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "![torchvision-predictions](https://cdn.voxel51.com/torchvision-predictions.webp)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Summary\n",
        "\n",
        "You've added object detections using Model Zoo models, Ultralytics YOLO, and custom models. Remember: FiftyOne uses normalized `[nx, ny, nw, nh]` bounding box format.\n",
        "\n",
        "Next up: **Step 3 covers finding detection mistakes**"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "OSS310",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.10.16"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 2
}
